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ABSTRACT 

This  paper  is  concerned  with  the  physics-based  simulation  of  light  tracked  vehicles  operating  on  rough 
deformable  terrain.  The  focus  is  on  small  autonomous  vehicles,  which  weigh  less  than  100  lb  and  move  on 
deformable  and  rough  terrain  that  is  feature  rich  and  no  longer  representable  using  a  continuum  approach. 
A  scenario  of  interest  is,  for  instance,  the  simulation  of  a  reconnaissance  mission  for  a  high  mobility 
lightweight  robot  where  objects  such  as  a  boulder  or  a  ditch  that  could  otherwise  be  considered  small 
for  a  truck  or  tank,  become  major  obstacles  that  can  impede  the  mobility  of  the  light  autonomous  vehicle 
and  negatively  impact  the  success  of  its  mission.  Analyzing  and  gauging  the  mobility  and  performance  of 
these  light  vehicles  is  accomplished  through  a  modeling  and  simulation  capability  called  Chrono: : Engine. 
Chrono: .'Engine  relies  on  parallel  execution  on  Graphics  Processing  Unit  (GPU)  cards. 


1  INTRODUCTION 

Engineers  are  increasingly  relying  on  simulation 
to  augment  and,  in  some  cases,  replace  costly  and 
time  consuming  experimental  work.  However,  cur¬ 
rent  simulation  capabilities  are  sometimes  inadequate 
to  capture  phenomena  of  interest.  In  tracked  vehi¬ 
cle  analysis,  for  example,  the  interaction  of  the  track 
with  granular  terrain  has  been  difficult  to  character¬ 
ize  through  simulation  due  to  the  prohibitively  long 
simulation  times  associated  with  many-body  dynam¬ 
ics  problems.  This  is  the  generic  name  used  here  to 


characterize  dynamic  systems  with  a  large  number  of 
bodies  encountered,  for  instance,  when  one  adopts  a 
discrete  representation  of  the  terrain  in  vehicle  dy¬ 
namics  problems.  However,  these  many-body  dynam¬ 
ics  problems  can  now  capitalize  on  recent  advances 
in  the  microprocessor  industry  that  are  a  consequence 
of  Moore’s  law,  of  doubling  the  number  of  transis¬ 
tors  per  unit  area  roughly  every  18  months.  Specifi¬ 
cally,  until  recently,  access  to  massive  computational 
power  on  parallel  supercomputers  has  been  the  priv¬ 
ilege  of  a  relatively  small  number  of  research  groups 
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in  a  select  number  of  research  facilities,  thus  limiting 
the  scope  and  impact  of  high  performance  comput¬ 
ing  (HPC).  This  scenario  is  rapidly  changing  due  to 
a  trend  set  by  general-purpose  computing  on  graph¬ 
ics  processing  unit  (GPU)  cards.  NVIDIA’s  CUDA 
library  [20]  allows  one  to  use  the  streaming  multipro¬ 
cessors  available  in  high-end  graphics  cards.  In  this 
setup,  a  latest  generation  NVIDIA  GPU  Kepler  card 
will  reach  1.5  Teraflops  by  the  end  of  2012  owing  to  a 
set  of  1536  scalar  processors  working  in  parallel,  each 
following  a  Single  Instruction  Multiple  Data  (SIMD) 
execution  paradigm.  Despite  having  only  1536  scalar 
processors,  such  a  card  is  capable  of  managing  tens  of 
thousands  of  parallel  threads  at  any  given  time.  This 
overcommitting  of  the  GPU  hardware  resources  is  at 
the  cornerstone  of  a  computing  paradigm  that  aggres¬ 
sively  attempts  to  hide  costly  memory  transactions 
with  useful  computation,  a  strategy  that  has  lead,  in 
frictional  contact  dynamics  simulation,  to  a  one  order 
of  magnitude  reduction  in  simulation  time  for  many- 
body  systems  [19,33]. 

The  challenge  of  using  parallel  computing  to  re¬ 
duce  simulation  time  and/or  increase  system  size 
stems,  for  the  most  part,  from  the  task  of  design¬ 
ing  and  implementing  many-body  dynamics  specific 
parallel  numerical  methods.  Designing  parallel  algo¬ 
rithms  suitable  for  frictional  contact  many-body  dy¬ 
namics  simulation  remains  an  area  of  active  research. 
Results  reported  in  [15]  indicate  that  the  most  widely 
used  commercial  software  package  for  multibody  dy¬ 
namics  simulation,  which  draws  on  a  so  called  penalty 
or  regularization  approach,  runs  into  significant  diffi¬ 
culties  when  handling  simple  problems  involving  hun¬ 
dreds  of  contact  events,  and  thus  cases  with  thousands 
of  contacts  become  intractable.  Unlike  these  penalty 
or  regularization  approaches  where  the  frictional  in¬ 
teraction  is  represented  by  a  collection  of  stiff  springs 
combined  with  damping  elements  that  act  at  the  inter¬ 
face  of  the  two  bodies  [10,21,27,28],  the  approach 
embraced  herein  draws  on  a  different  mathematical 
framework.  Specifically,  the  parallel  algorithms  rely 
on  time-stepping  procedures  producing  weak  solu¬ 
tions  of  the  differential  variational  inequality  (DVI) 


problem  that  describes  the  time  evolution  of  rigid  bod¬ 
ies  with  impact,  contact,  friction,  and  bilateral  con¬ 
straints.  When  compared  to  penalty-methods,  the  DVI 
approach  has  a  greater  algorithmic  complexity,  but 
avoids  the  small  time  steps  that  plague  the  former  ap¬ 
proach. 

The  task  of  presenting  this  class  of  algorithms 
and  their  parallel  implementation  is  organized  as  fol¬ 
lows.  Section  2  provides  a  brief  description  of  the 
general  equations  that  capture  the  dynamics  of  many- 
body  systems.  This  section  also  contains  an  out¬ 
line  of  the  parallel  method  embraced  to  numerically 
solve  the  equations  of  motion.  One  of  the  challeng¬ 
ing  components  of  the  solution  method  is  the  colli¬ 
sion  detection  step  required  to  determine  the  set  of 
contacts  active  in  the  many-body  system.  These  con¬ 
tacts,  crucial  in  producing  the  frictional  contact  forces 
at  work  in  the  system,  are  determined  in  parallel  using 
an  approach  outlined  in  Section  3.  A  scalable  ren¬ 
dering  pipeline  that  can  leverage  thousands  of  CPU 
cores  for  visualization  purposes  is  discussed  in  Sec¬ 
tion  4.  The  engineering  application  used  to  demon¬ 
strate  this  parallel  simulation  capability  is  that  of  a 
light  tracked  vehicle  that  operates  on  granular  terrain 
and  negotiates  an  obstacle  course.  To  further  illustrate 
the  versatility  of  the  simulation  capability,  the  vehi¬ 
cle  is  assumed  to  be  equipped  with  a  drilling  device 
used  to  penetrate  the  terrain.  Both  the  vehicle  dynam¬ 
ics  and  the  drilling  process  are  seamlessly  analyzed 
within  the  same  HPC-enabled  simulation  capability. 
A  schematic  of  the  vehicle  is  provided  in  Fig.  1.  A 
cut-away  image  of  the  drilling  tool  is  shown  in  isola¬ 
tion  in  Fig.  2. 

2  THE  MANY-BODY  DYNAMICS  PROBLEM 
2.1  General  Considerations 

The  modeling  approach  adopted  in  order  to 
abstract  and  represent  the  dynamics  of  the  vehi¬ 
cle/terrain  interaction  is  based  on  a  differential  vari¬ 
ational  inequality  (DVI)  methodology.  Compared  to 
penalty  or  regularization  approaches  [10,  21,  27,  28], 
it  allows  for  larger  integration  step  sizes.  The  formu- 
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FIGURE  1:  Light  autonomous  vehicle  negotiating  a  pile  of  rubble. 


FIGURE  2:  Cutaway  view:  Anchor  penetrating  granular  material  [17]. 


lation  of  the  equations  of  motion,  that  is,  the  equa¬ 
tions  that  govern  the  time  evolution  of  a  multibody 
system,  is  based  on  the  so-called  absolute,  or  Carte¬ 
sian,  representation  of  the  position  and  attitude  of  each 
rigid  body  in  the  system.  The  state  of  the  system 
is  denoted  by  the  generalized  positions  q  =  \r\  ,e] . 

. . .  7  G  K7'7,;  and  their  time  derivatives  q  = 

[rf ,  ef, . . . ,  rj.  ,  e;7  J 7  g  M?nb,  where  nb  is  the  number 
of  bodies,  rj  is  the  absolute  position  of  the  center  of 
mass  of  the  jth  body,  and  the  quaternions  (Euler  pa¬ 
rameters)  £j  are  used  to  represent  rotation  and  to  avoid 
singularities.  Instead  of  using  quaternion  derivatives 


in  q,  it  is  more  advantageous  to  work  with  angu¬ 
lar  velocities  expressed  in  the  local  (body-attached) 
reference  frames;  in  other  words,  the  method  de¬ 
scribed  will  use  the  vector  of  generalized  velocities 
v  =  [r[,  caf , . . . , rjlh .  coJJ 7  g  M6"* .  Note  that  the  gen¬ 
eralized  velocity  can  be  easily  obtained  as  q  =  L(q)v, 
where  L  is  a  linear  mapping  that  transforms  each 
CO,  into  the  corresponding  quaternion  derivative  £,■  by 
means  of  the  linear  algebra  formula  £,  =  ^Gr(q)ft),;, 
with  3x4  matrix  G(q)  as  defined  in  [11]. 
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2.1.1  Bilateral  Constraints  Bilateral  constraints 
represent  kinematic  relationships  between  two  rigid 
bodies  in  the  system.  For  example,  spherical  joints, 
prismatic  joints,  or  revolute  joints  can  be  expressed  as 
holonomic  algebraic  equations  constraining  the  rela¬ 
tive  positions  of  two  bodies.  A  set  B  of  constraints 
leads  to  a  collection  of  scalar  equations 

^•(q,0  =  0,  ie&,  (1) 

the  number  of  which  depends  on  the  type  of  con¬ 
straints  in  set  B.  Bilateral  constraints  must  also  be  sat¬ 
isfied  at  the  velocity  level, 


cm^t) 

dt 


dWi.  ,  dWi 

dqq+  dt 


V  +  <^± 
q  1  q+  dt 

V^L(q)y  + 


d% 

dt 


0, 


which  is  obtained  by  taking  one  time  derivative  of 
Equation  1. 


2.1.2  Unilateral  Constraints  and  Friction  Uni¬ 
lateral  constraints  enforce  contact  constraints  between 
rigid  bodies  in  the  system.  It  is  assumed  that  a  gap 
function,  <F(q),  can  be  defined  for  each  pair  of  near¬ 
enough  bodies.  This  gap  function  describes  the  dis¬ 
tance  between  the  closest  points  on  the  two  bodies  of 
interest. 

Unilateral  contact  constraints  also  introduce  fric¬ 
tion  forces  into  the  system.  When  a  contact  is  active, 
or  d>j(q)  =  0,  a  normal  force  acts  on  each  of  the  two 
bodies  at  the  contact  point.  When  a  contact  is  inac¬ 
tive,  or  <J>,-(q)  >  0  ,  no  normal  force  exists.  This  rep¬ 
resents  a  complementarity  condition.  Consider  two 
bodies  A  and  B  in  contact  as  shown  in  Fig.  3.  Let  n, 
be  the  normal  at  the  contact  pointing  toward  the  exte¬ 
rior  of  the  body  of  lower  index,  which  by  convention 
is  considered  to  be  body  A.  Let  u;  and  w,  be  two  vec¬ 
tors  in  the  contact  plane  such  that  n,,u;,w,  G  M3  are 


mutually  orthonormal  vectors.  The  frictional  contact 
forces  are  defined  by  the  multipliers  Yin  >  0,  jy„,  and 
Yi  w,  which  lead  to  the  normal  component  of  the  fric¬ 
tion  force,  F;yv  =  Yi.nni  and  the  tangential  component 
of  the  force  Ff,r  =  )5,«Uf  + 

The  Coulomb  friction  model,  which  draws  for 
contact  i  on  the  friction  coefficient  /i7,  is  used  to  write 
the  following  constraints: 


Yin  >  0,  <F,(q)>0,  <J>/(q)tf,«  =  0,  (2) 


BiYi,n  >  \  Yju  +  Y'i 


*l,W  ’ 


V/r 


0, 


(F i,T,\ij)  =  —  ||F/,r||  Hv^rll  (3) 

Equation  2  captures  the  complementarity  condition 
previously  described.  The  subsequent  two  equations 
relate  the  magnitude  and  direction  of  the  friction  force 
to  the  multipliers  and  tangential  velocity  of  the  con¬ 
tact.  These  remaining  equations  can  be  expressed  in 
an  equivalent  manner  using  the  maximum  dissipation 
principle.  This  frames  the  Coulomb  friction  model  as 
a  minimization  problem,  which  can  be  seen  in  Equa¬ 
tion  4. 


argmin  \JT  (y-.„u,  +  %ww ,•) .  (4) 


The  nature  of  the  friction  cone  can  be  seen  if  yet  an¬ 
other  form  of  the  friction  force  equations  is  consid¬ 
ered.  The  friction  force  of  the  i-th  contact  can  be  ex¬ 
pressed  as  follows,  where  T  is  a  cone  in  three  dimen¬ 
sions  whose  slope  is  tan /l;\ 

F j  =  F,yv  "F  F i  t  —  Yin^i  3”  YixA^-i  3~  Yi-W^i  G  T,  (5) 
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FIGURE  3:  Contact  i  between  two  bodies  A,B  G  {1,2, . .  .,«&}. 


2.1.3  Equations  of  Motion  of  Systems  with  Fric¬ 
tional  Contact  The  time  evolution  of  the  dynamical 
system  is  governed  by  the  following  differential  vari¬ 
ational  inequality  [6] : 


s/,g  are  the  contact  point  positions  in  body  coordinates 
(see  Fig.  3).  A  tilde  x  over  a  vector  x  G  M3  represents 
the  skew  symmetric  matrix  associated  with  the  outer 
product  of  two  vectors  [11]. 


q  =  L(q)v 

Mv  =  f (f,q,v)  +  E  %bV^i+ 

ieSS 

T  E  (Yi,n Di,n  T  T  Yi,w^i,w) 

iesrf 

ie38  :  T,/(q,r)  =  0 
i  eg/  •  Yi,n>  o  -L  q>/(  q)  >  0,  and 
(' Yi,uiYi,w )  =  argmin  v  (Yi,u^i,u~ F . 

(6) 

The  tangent  space  generators  D,  =  [D(-n,  D;  D;>]  G 
WL(mi> x  3  are  sparse  and  are  defined  given  a  pair  of  con¬ 
tacting  bodies  A  and  B  as 


D 


[°  ■  ■  ■  ~AIp  AIp A^iA  0 

0  AlP  ~aIpab~Kb  0  . . .] , 


(7) 


where  A,4  is  the  orientation  matrix  associated  with 
body  A,  Ai  p  —  [n,-,u,-,  w,-]  is  the  M3x3  matrix  of  the  lo¬ 
cal  coordinates  of  the  z'th  contact,  the  vectors  sla  and 


2.2  Discretization  Scheme  for  Numerical  Solution 

Equation  2  expresses  the  complementarity  condi¬ 
tion  between  the  normal  force  and  the  gap  function 
in  a  contact  event.  The  presence  of  these  comple¬ 
mentarity  conditions  is  the  trademark  of  a  DVI  for¬ 
mulation,  whose  numerical  solution  in  the  context  of 
rigid  body  dynamics  can  be  traced  back  to  [14, 16, 18]. 
The  DVI  formulations  have  been  classified  by  differ¬ 
ential  index  in  [22]  and  recent  time-stepping  schemes 
have  included  both  acceleration-force  linear  comple¬ 
mentarity  problem  (LCP)  approaches  [7,  23,  34]  and 
velocity-impulse  LCP-based  time-stepping  methods 
[4,  5,  29,  30].  The  LCPs,  obtained  as  a  result  of  the 
introduction  of  inequalities  in  time-stepping  schemes 
for  DVI,  coupled  with  a  polyhedral  approximation  of 
the  friction  cone  must  be  solved  at  each  time  step  in 
order  to  determine  the  system  state  configuration  as 
well  as  the  Lagrange  multipliers  representing  the  re- 
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action  forces  [14, 30].  If  the  simulation  entails  a  large 
number  of  contacts  and  rigid  bodies,  as  in  the  case  of 
part  feeders,  packaging  machines,  and  granular  flows, 
the  computational  burden  of  classical  LCP  solvers  can 
become  significant.  Indeed,  a  well-known  class  of  nu¬ 
merical  methods  for  LCPs  based  on  simplex  methods , 
also  known  as  direct  or  pivoting  methods  [9],  may 
exhibit  exponential  worst-case  complexity  [8].  They 
may  be  impractical  even  for  problems  involving  as 
few  as  several  hundred  bodies  when  friction  is  present 
[3,  32].  Moreover,  the  three-dimensional  Coulomb 
friction  case  leads  to  a  nonlinear  complementarity 
problem  (NCP):  the  use  of  a  polyhedral  approxima¬ 
tion  to  transform  the  NCP  into  an  LCP  introduces  ar¬ 
tificial  anisotropy  in  friction  cones  [4,30,34].  This 
discrete  and  finite  approximation  of  friction  cones  is 
one  of  the  reasons  for  the  large  dimension  of  the  prob¬ 
lem  that  needs  to  be  solved  in  multibody  dynamics 
with  frictional  contact. 


In  order  to  circumvent  the  limitations  imposed  by 
the  use  of  classical  LCP  solvers  and  the  limited  accu¬ 
racy  associated  with  polyhedral  approximations  of  the 
friction  cone,  a  parallel  fixed-point  iteration  method 
with  projection  on  a  convex  set  has  been  proposed, 
developed,  and  tested  in  [6].  The  method  is  based  on 
a  time-stepping  formulation  that  solves  at  every  step  a 
cone  constrained  optimization  problem  [1].  The  time¬ 
stepping  scheme,  proved  to  converge  in  a  measure  dif¬ 
ferential  inclusion  sense  to  the  solution  of  the  original 
continuous-time  DVI,  sets  off  at  time  tj  by  assuming 
that  a  set  of  contacts,  g/,  exists  between  bodies  in  the 
system,  and  a  set  of  bilateral  constraints,  dd,  is  also 
active.  The  governing  differential  equations  then  as¬ 
sume  the  form  of  a  DVI  problem.  The  equation  of  mo¬ 
tion  is  discretized  so  that  an  approximation  to  the  so¬ 
lution  can  be  found  at  discrete  instants  in  time.  Given 
a  position  qW  and  velocity  v(/j  at  the  time  step  t{l), 
the  numerical  solution  is  found  at  the  new  time  step 
+h  by  solving  the  following  optimization 


problem  with  equilibrium  constraints  [31]: 

M(v(/+1)  -yW)  =  M(?«,q(0?vOO)  +  £  Yi,b^i  + 

ieSS 

T  (  Yi,n  D/,n  +  Yi.u  D/.m  +  Yi,w  D;,w)  ,(8) 

('e.f :  ivP/(qW,t)  +  VvPfv(/+1)  +  5i  =0  (9) 

(TV:  0<  j4>/(q(/))  +  D?Bv(/+1)  J.^>0,  (10) 

{,Yi,ui  Yi,w)  =  argmin  v^  ^  T  Yi,w^i,w) 

(ID 


q(/+1l  =  qW  +/*L(qW)y(/+1).  (12) 

Here,  ys  represents  the  constraint  impulse  of  a  con¬ 
tact  constraint;  that  is,  Ys  =  h%,  for  5  =  n.u.  w.  The 
^fl>;(q^)  term  achieves  constraint  stabilization;  its  ef¬ 
fect  is  discussed  in  [2].  Similarly,  the  term  ^'Pj(qW) 
achieves  stabilization  for  bilateral  constraints.  The 
scheme  converges  to  the  solution  of  a  measure  differ¬ 
ential  inclusion  [1]  when  the  step  size  h  — »  0. 

The  proposed  approach  casts  the  problem  as 
a  monotone  optimization  problem  through  a  relax¬ 
ation  over  the  complementarity  constraints,  replacing 
Eq.  (10)  with 

i  e  srf : 

— H,V(vI'Dl>)2+  (vrDi,„)2  Y„  >  0. 

The  solution  of  the  modified  time-stepping 
scheme  will  approach  the  solution  of  the  same  mea¬ 
sure  differential  inclusion  for  h  — >  0  as  the  original 
scheme  [1],  yet,  in  some  situations,  for  large  h,  p,  or 
relative  velocity  v^/+1),  i.e.,  when  not  in  an  asymptotic 
regime,  this  relaxation  can  introduce  motion  oscilla¬ 
tions.  It  was  shown  in  [6]  that  the  modified  scheme 
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is  a  cone  complementarity  problem  (CCP),  which  can 
be  solved  efficiently  by  an  iterative  numerical  method 
that  relies  on  projected  contractive  maps.  Omitting 
for  brevity  some  of  the  details  discussed  in  [6,  33], 
we  note  that  the  algorithm  makes  use  of  the  following 
vectors: 

k  =  Mv(z)+/*f(t(/),q(/),v(/))  (13) 

b,  =  {id>,(q(/)),0,o}r  i  G  .<  (14) 

q(Z),r)  +  ^S  ieSB.  (15) 

The  solution,  in  terms  of  dual  variables  of  the  CCP 
(the  multipliers),  is  obtained  by  iterating  the  following 
contraction  maps  until  convergence,  where  Fix,  repre¬ 
sents  the  orthogonal  projection  on  the  friction  cone 
associated  with  contact  i  [31]: 

v/€^:  tf+1  =nT(.  [yi-(ori,(Djyr  +  b,)]  (16) 
Vie#  :  tf+1  =  nT.  [ft  -  cor],  (VWfV  +  bi)]  (17) 

At  each  iteration  r,  before  repeating  (16)  and  (17), 
also  the  primal  variables  (the  velocities)  are  updated 
as 

vr+1  =  M_1  (  £  Dzf+l  +  £  VvP-y!'+1  +  k  j(.18) 

\ze£f  ) 

2.3  Parallel  Implementation 

The  dynamics  of  a  large  multibody  system  whose 
bodies  interact  through  contact,  friction,  and  bilateral 
constraints  can  be  simulated  in  time  via  the  CCP  algo¬ 
rithm  previously  described.  A  sequential  implemen¬ 
tation  of  this  algorithm  is  described  by  the  following 
pseudo-code: 


Algorithm  1:  Inner  Iteration  Loop 

1 .  For  i  G  ,c/(q,  <5 ) ,  evaluate  i],  —  3 /Trace(  D f  M 1 D,) . 

2.  For  i  G  SB,  evaluate  7],  =  l/(  WfM -1V'P!). 


3.  Warm  start:  if  some  initial  guess  y"  is  available  for 
multipliers,  then  set  y°  =  y*,  otherwise  y°  =  0. 

4.  Initialize  velocities:  v°  =  + 

5.  For  i  G  s/(q^\8),  compute  changes  in  multipli¬ 
ers  for  contact  constraints: 

y+l  =  x  nT.  (y[  -  (07],  (Djr+bi))  + 

(i-W; 

A?f+1  =  Y'+l  -  Yi  ; 

A  V,-  =  1VU1  D,Ayf+1. 

6.  For  i  G  SB,  compute  changes  in  multipliers  for  bi¬ 
lateral  constraints: 

Y'+1  =  A  (y[  —  (OT]i  (VT'f  \r  +  b,) )  + 

(1-W; 

A  Yi+1  =  Y+1  -  Yi  ; 

Avi  =  M-1V'P,Ayf+1. 

7.  Apply  updates  to  the  velocity  vector: 

vr+1  =  vr  +  A ,Vi  +  L,-e#  Av;- 

8.  r  r+  1.  Repeat  from  5  until  convergence,  or 
until  r  >  rmax. 


The  stopping  criterion  is  based  on  the  value  of  the 
velocity  update.  The  overall  algorithm  that  provides 
an  approximation  to  the  solution  of  Eqs.  8  through  12 
relies  on  Algorithm  1  and  requires  the  following  steps: 


Algorithm  2:  Outer,  Time-Stepping,  Loop 

1 .  Set  t  =  0,  step  counter  /  =  0,  provide  initial  values 
for  q^  and  v^. 

2.  Perform  collision  detection  between  bodies,  ob¬ 
taining  risf  possible  contact  points  within  a  dis¬ 
tance  S.  For  each  contact  i,  compute  D/)W,  D,.„, 
DjtW;  for  each  bilateral  constraint  compute  the 
residual  <F,(q),  which  also  provides  by. 

3.  For  each  body,  compute  forces  f(f^\q^,v^). 

4.  Use  Algorithm  1  to  solve  the  cone  complemen¬ 
tarity  problem  and  obtain  unknown  impulse  y  and 
velocity  v^z+1\ 

5.  Update  positions  using  q(/+1)  =  q-®  + 

6.  Increment  t  :=  t  +  h,  l  :=/  +  !,  and  repeat  from 
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step  2  until  t  >  /end 


A  parallel  implementation  that  leveraged  the  par¬ 
allel  computing  power  of  commodity  GPUs  was  con¬ 
sidered  based  on  the  two  algorithms  outlined  above. 
Solution  of  the  CCP  problem  proceeds  as  a  collec¬ 
tion  of  functions,  or  kernels,  which  are  executed  on 
the  GPU.  First,  some  pre-processing  steps  are  exe¬ 
cuted.  Applied  forces  are  calculated  in  a  body-parallel 
fashion,  and  contacts  are  preprocessed  in  a  contact- 
parallel  fashion  to  compute  the  normal  direction  and 
friction  plane  directions.  Next,  the  inner  iteration  loop 
is  entered  and  a  series  of  four  kernels  is  executed  until 
convergence.  In  a  contact-parallel  manner,  the  unilat¬ 
eral  constraints  are  processed.  In  a  constraint-parallel 
manner,  the  bilateral  constraints  are  processed.  In 
a  reduction-slot-parallel  manner,  speed  updates  are 
summed  to  a  single  resultant  per  body.  Finally,  in 
a  body-parallel  manner,  speed  updates  are  applied  to 
each  body.  Once  a  certain  number  of  iterations  has 
been  performed  or  convergence  has  been  achieved,  the 
generalized  velocities  are  integrated  forward  in  time 
in  a  body-parallel  fashion  to  get  the  set  of  generalized 
positions.  Details  of  the  parallel  reduction  of  speed- 
updates  can  be  found  in  [19].  Pseudo-code  for  the 
parallel  implementation  can  be  seen  below.  Details  re¬ 
garding  data  structures  and  computational  flow  of  the 
parallel  implementation  can  also  be  found  in  [19,33]. 
The  details  regarding  the  parallel  collision  detection 
are  provided  in  Section  3. 


Parallel  Kernels  for  Solution  of  Dynamics 
Problem 

1 .  Parallel  Collision  Detection 

2.  (Body  parallel)  Force  kernel 

3.  (Contact  parallel)  Contact  preprocessing  kernel 

4.  Inner  Iteration  Loop: 

(a)  (Contact  parallel)  CCP  contact  kernel 

(b)  (Bilateral-Constraint  parallel)  CCP  constraint 
kernel 

(c)  (Reduction-slot  parallel)  Velocity  change  re¬ 


duction  kernel 

(d)  (Body  parallel)  Body  velocity  update  kernel 

5.  (Body  parallel)  Time  integration  kernel 


3  PARALLEL  COLLISION  DETECTION 

The  implemented  3D  collision  detection  algorithm 
performs  a  two-level  spatial  subdivision  using  axis- 
aligned  bounding  boxes.  The  first  partitioning  occurs 
at  the  CPU  level  and  yields  a  relatively  small  num¬ 
ber  of  large  boxes.  The  second  partitioning  of  each  of 
these  boxes  occurs  at  the  GPU  level  yielding  a  large 
number  of  small  bins.  The  GPU  3D  collision  detec¬ 
tion,  which  handles  spheres,  ellipsoids,  and  planes, 
occurs  in  parallel  at  the  bin  level.  Any  other  geome¬ 
tries  are  represented  as  a  collection  of  these  primitives 
using  a  padding  (decomposition)  process  detailed  in 
[12].  Several  kernel  calls  build  on  each  other  to  even¬ 
tually  enable,  in  a  one-thread-per-bin  GPU  parallel 
fashion,  an  exhaustive  collision  detection  process  in 
which  thread  i  checks  for  collisions  between  all  the 
bodies  that  happen  to  intersect  the  associated  bin  i. 
This  requires  ^(bf)  computational  effort,  where  bj 
represents  the  number  of  bodies  touching  bin  i.  The 
value  of  bi  is  controlled  by  an  appropriate  selection 
of  the  bin  size.  Figure  4  illustrates  a  typical  collision 
detection  scenario  and  is  used  in  what  follows  to  out¬ 
line  the  nine  stages  of  the  proposed  approach.  Note 
that  the  actual  implementation  is  for  3D  collision  de¬ 
tection  and  does  not  require  the  bodies  to  be  spheres. 
Stage  1.  The  process  begins  by  counting  for  each  ob¬ 
ject  the  number  of  bins  it  intersects.  As  Fig.  5  shows, 
an  object  (body)  can  intersect,  or  touch,  more  than  one 
bin.  The  minimum  and  maximum  bounding  points  of 
each  object  are  determined  and  placed  in  their  respec¬ 
tive  bins.  For  example.  Fig.  5  shows  that  object  4’s 
minimum  point  lies  in  B4  and  its  maximum  point  in 
A5.  The  entire  object  must  fit  between  the  minimum 
and  maximum  points;  therefore  the  number  of  bins 
that  the  object  intersects  can  be  determined  quickly  by 
counting  the  number  of  bins  between  the  two  points  in 
each  axis  and  multiplying  them.  In  this  case  the  num- 
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FIGURE  4:  Two-dimensional  example  used  to  intro¬ 
duce  the  nine  stages  of  the  collision  detection  pro¬ 
cess.  The  grid  is  aligned  to  a  global  Cartesian  refer¬ 
ence  frame. 


N 

FIGURE  6:  Array  T  with  N  entries,  based  on  spatial 
subdivision  in  Fig.  4. 


her  is  4.  For  each  body,  this  number  is  saved  into  an 
array  T  (see  Fig.  6),  of  size  equal  to  the  number  of 
bodies  N. 

Stage  2.  An  inclusive  parallel  prefix  sum  is  carried 
out  on  T  [26].  The  CUDA-based  Thrust  library  im¬ 
plementation  [13]  of  the  scan  algorithm  operates  on  T 
to  return  in  S  (see  Fig.  7)  the  memory  offset  informa¬ 
tion. 

Stage  3.  An  array  B  (see  Fig.  8),  is  first  allocated  of 
size  equal  to  the  value  of  the  last  element  in  S.  This 
value  is  equal  to  the  total  number  of  object-bin  inter¬ 
sections.  Each  element  in  B  is  set  to  a  key- value  pair 
of  two  unsigned  integers.  The  key  is  the  bin  id  and 


FIGURE  5:  Minimum  and  maximum  bounds  of  object, 
based  on  spatial  subdivision  in  Fig.  4. 
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FIGURE  7:  Result  of  prefix  sum  operation  on  T, 
based  on  spatial  subdivision  in  Fig.  4.  Each  entry 
represents  an  object’s  offset  based  on  the  number 
of  bins  it  touches. 


the  value  is  the  object  id.  In  this  stage,  the  memory 
offsets  contained  in  S  are  used  so  that  the  thread  as¬ 
sociated  with  each  body  can  write  data  to  the  correct 
location  in  B. 

Stage  4.  In  this  stage,  the  key-value  array  B  is  sorted 
by  key,  that  is,  by  bin  id.  This  effectively  inverts 
the  body-to-bin  mapping  to  a  bin-to-body  mapping  by 
grouping  together  all  bodies  in  a  given  bin  for  further 
processing.  The  stage  draws  on  the  GPU-based  radix 
sort  from  the  Thrust  library  [13]. 

Stage  5.  Next,  the  start  of  each  bin  in  the  sorted  array 
B  is  identified  in  parallel.  The  number  of  threads  used 
to  this  end  is  equal  to  the  number  of  elements  in  B;  i.e., 
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FIGURE  8:  Array  B,  based  on  spatial  subdivision  in  Fig.  4. 
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FIGURE  9:  Sorted  array  B,  based  on  spatial  subdivision  in  Fig.  4. 


the  number  of  object-bin  interactions.  Each  thread 
reads  the  current  and  previous  bin  value;  if  these  val¬ 
ues  differ,  then  the  start  of  a  bin  has  been  detected. 
The  starting  positions  for  each  bin  are  written  into  an 
array  C  of  key-value  pairs  of  size  equal  to  the  number 
of  bins  in  the  3D  grid.  When  the  start  of  a  bin  is  found 
in  array  B,  the  thread  and  bin  id  are  saved  as  the  key 
and  value,  respectively.  This  pair  is  written  to  the  ele¬ 
ment  in  C  indexed  by  the  bin  id.  Note  that  not  all  bins 
are  active.  Inactive  bins;  i.e.,  bins  touched  by  zero 
or  one  bodies,  are  set  to  Oxffffffff,  the  largest  possible 
value  for  an  unsigned  integer  on  a  32-bit,  X86  archi¬ 
tecture.  Figure  10  shows  the  outcome  of  this  stage. 

Stage  6.  The  array  C  is  next  radix- sorted  [13]  by  key. 
Consequently,  inactive  bins  (identified  by  the  Oxffffffff 
entries,  represented  for  brevity  as  Oxfff  in  Fig.  11) 
“migrate”  to  the  end  of  the  array. 

Stage  7.  The  total  number  of  active  bins  is  determined 
next  by  finding  the  index  in  the  sorted  array  C  of  the 
first  occurrence  of  Oxffffffff.  Determining  this  index 
allows  memory  and  thread  usage  to  be  allocated  accu¬ 
rately  thus  having  no  threads  wasted  on  inactive  bins. 
One  GPU  thread  is  assigned  in  this  stage  to  each  active 
bin  to  perform  an  exhaustive,  brute-force,  bin-parallel 
collision  detection  for  the  purpose  of  only  counting 
the  collision  events.  By  carefully  selecting  the  bin 
size,  the  number  of  objects  being  tested  for  collisions 


is  expected  to  be  small;  i.e.,  on  average,  bj  is  in  the 
range  of  3  to  4  objects  per  bin.  After  counting  the 
total  number  of  collisions  in  its  bin,  the  thread  writes 
that  tally  into  an  unsigned  integer  array  D  of  size  equal 
to  the  number  of  active  bins. 

More  involved,  the  algorithm  for  counting  and 
subsequently  computing  ellipsoid  collision  informa¬ 
tion  is  described  in  detail  in  [24].  For  spheres,  the 
algorithm  checks  for  collisions  by  calculating  the  dis¬ 
tance  between  the  centers  of  the  objects.  Contacts  can 
occur  only  when  the  distance  between  the  spheres’ 
centers  is  less  than  or  equal  to  the  sum  of  their  radii. 
Because  one  object  could  be  contained  within  more 
than  one  bin,  checks  were  implemented  to  prevent 
double  counting.  Since  the  midpoint  of  a  collision 
volume  can  be  contained  only  within  one  bin,  only  one 
thread  (associated  with  that  bin)  will  register/count  a 
collision  event.  For  example,  in  order  to  determine 
the  midpoint  of  the  collision  volume  the  algorithm  re¬ 
lies  on  the  vector  from  the  centroid  of  object  4  to  the 
centroid  of  object  7;  see  Fig.  12.  The  points  where  this 
vector  intersects  each  object  defines  a  segment;  the  lo¬ 
cation  of  the  middle  of  this  segment  is  used  to  decide 
the  unique  bin  that  claims  ownership  of  the  contact.  If 
one  object  is  completely  inside  the  other,  the  midpoint 
of  the  collision  volume  is  the  centroid  of  the  smaller 
object.  Using  this  process,  the  number  of  collisions 
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FIGURE  10:  Array  C,  based  on  spatial  subdivision  in  Fig.  4. 
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FIGURE  1 1 :  Sorted  array  C,  based  on  spatial  subdivision 
in  Fig.  4. 


are  counted  for  each  bin  and  written  to  D. 

Stage  8.  An  inclusive  parallel  prefix  scan  operation 
[13]  is  performed  on  D.  This  returns  an  array  E  whose 
last  element  is  the  total  number  of  collisions  in  the 
uniform  grid,  a  value  that  allows  an  exact  amount  of 
memory  to  be  allocated  in  the  next  stage. 

Stage  9.  The  final  stage  of  the  collision  detection  al¬ 
gorithm  computes  the  actual  contact  information.  To 
this  end,  an  array  of  contact  information  structures  F 
is  allocated  with  a  size  equal  to  the  value  of  the  last 
element  in  E.  The  collision  pairs  are  then  found  by  us¬ 
ing  the  algorithm  outlined  in  Stage  7.  Instead  of  sim¬ 
ply  counting  the  number  of  collisions,  actual  contact 
information  is  computed  and  written  to  its  respective 
place  in  F. 


4  RENDERING  PIPELINE 

Adequately  understanding  the  results  of  a  simu¬ 
lation  would  be  very  arduous  without  an  element  of 
visualization  since,  as  systems  become  more  com- 
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FIGURE  12:  Center  of  collision  volume. 

Based  on  spatial  subdivision  in  Fig.  4. 

plex,  the  sheer  volume  of  components  and  numerical 
outputs  makes  the  results  exceedingly  difficult  to  in¬ 
terpret;  consequently,  rendering  is  a  critical  last  step 
to  the  modeling  and  simulation  process.  To  address 
this  issue,  a  high  performance  visualization  pipeline 
has  been  generated  that  allows  for  a  simple  means  to 
create  general-purpose  renderings  of  arbitrary  mod¬ 
els.  It  supports  a  variety  of  simulation  data  files  (csvs, 
custom-format,  etc.)  to  remotely  and  easily  generate 
an  animation  during  or  immediately  after  a  simulation 
is  computed.  The  “high-performance”  attribute  of  this 
pipeline  stems  from  its  ability  to  scale  up  to  1000s  of 
CPU  cores  as  demonstrated  by  its  use  on  the  Euler  su¬ 
percomputer  available  to  this  research  group  [25]. 

Creating  this  pipeline  poses  several  technical 
problems,  the  most  conspicuous  of  which  being  how 
to  “automate”  the  process  as  well  as  how  to  handle 
massively  complex  scenes.  These  issues  have  been 
addressed  by  utilizing  Renderman  [?]  in  conjunction 
with  in-house  developed  code  with  the  overall  goal 
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of  implementing  a  full  fledged  distributed-computing 
rendering  solution.  The  developed  solution  leverages 
Renderman’s  REYES  (Renders  Everything  You  Ever 
Saw)  algorithm  to  handle  complexity,  uses  computer 
clusters  to  process  jobs,  and  utilizes  the  Renderman 
Bytestream  and  Shading  Language  to  generate  proce¬ 
dural  scenes.  Handling  geometrically  complex  scenes 
with  limited  hardware  resources,  such  as  rendering 
a  model  with  millions  of  granular  bodies,  is  an  in¬ 
surmountable  challenge  for  many  commercial  Tender¬ 
ers.  However,  Renderman  was  particularly  designed 
to  handle  this  problem  with  the  REYES  algorithm.  At 
the  high-level,  the  REYES  algorithm  is  a  micropoly¬ 
gon  Tenderer,  which  only  performs  shading  compu¬ 
tations  for  a  subset  of  visible  polygons  at  any  given 
time,  loading  just  this  relevant  scene  data  into  mem¬ 
ory.  This  data  can  be  ’’bucketed”  according  to  grids 
of  pixels  of  the  output  image  and  rendered  indepen¬ 
dently.  The  REYES  pipeline  is  illustrated  in  Fig.  13. 

Speed  is  another  critical  attribute  of  the  render¬ 
ing  pipeline;  this  is  an  attribute  where,  given  the  size 
of  the  many-body  systems  considered,  distributed- 
computing  becomes  an  absolute  necessity.  To  put  it 
in  perspective,  in  order  to  render  a  two-hour  movie 
at  24  frames  per  second  in  one  year,  each  frame  can 
afford  only  three  minutes  of  render  time,  a  relatively 
short  timeframe  for  complex  scenes.  Computer  clus¬ 
ters  combined  with  Renderman  offer  two  vital  means 
to  parallelize  rendering:  simultaneous  image  render¬ 
ing  and  distributed  bucket  rendering.  Simultaneous 
image  rendering  tasks  individual  compute  nodes  with 
rendering  a  single  image  from  the  animation,  the  num¬ 
ber  of  frames  rendered  in  parallel  scaling  linearly  with 
the  number  of  nodes.  Distributed  bucket  rendering  al¬ 
lows  for  parallel  rendering  of  the  same  image,  where 
pixel  buckets  are  rendered  independently  on  separate 
nodes  and  then  stitched  back  together  to  form  the  final 
image.  The  benefits  of  these  approaches  are  immedi¬ 
ately  apparent  in  the  speedup  factor,  but  also  with  the 
flexibility  in  rendering  approach,  where  one  can  tai¬ 
lor  the  computation  for  a  particular  need  (such  as  dis¬ 
tributed  bucket  rendering  for  an  immensely-visually 
complex  still  image). 


Finally,  in  order  to  simplify  the  rendering  pro¬ 
cess  for  users  without  a  background  in  graphics,  the 
pipeline,  which  is  illustrated  in  Fig.  14,  must  automate 
image  generation  as  much  as  possible  while  retain¬ 
ing  the  ability  to  render  arbitrary  visual  effects;  the 
Renderman  Bytestream  and  Shading  Language  pro¬ 
vide  the  means  to  meet  this  demand.  The  Renderman 
Bytestream  simply  allows  us  to  pipe  Renderman  calls 
into  the  Tenderer  at  runtime  thus  facilitating  procedu¬ 
ral  calls  as  the  scene  is  being  rendered.  These  pro¬ 
cedural  calls  are  determined  by  interpreting  data  with 
a  simulation-specific  metadata  file  that  is  either  gen¬ 
erated  or  defined  by  the  user.  This  small  metadata 
file  configures  the  formatting  options  (such  as  reso¬ 
lution,  input  data  format,  etc.)  and  the  salient  features 
of  objects  in  the  simulation  (geometry,  appearance, 
etc.).  Attaching  user-specified  Renderman  shaders  to 
the  objects  enables  customization  of  simulation  ob¬ 
ject  appearance.  Shaders  are  highly-functional  com¬ 
piled  bits  of  code  that,  at  a  high-level,  programmati¬ 
cally  control  how  a  micropolygon  is  perturbed  or  col¬ 
ored;  the  Renderman  Shading  Language  is  powerful 
and  flexible  enough  to  make  it  possible  to  define  any 
visual  effect.  The  user  can  draw  from  a  library  of 
shaders  (created  by  our  group)  or  define  their  own  and 
assign  them  in  the  metadata  file,  consequently  retain¬ 
ing  full  control  over  the  appearance  of  their  model. 
One  last  feature  of  the  pipeline  is  the  notion  of  “inject¬ 
ing”  simulation  data  into  predefined  scenes.  Setting 
up  the  aesthetic  components  of  a  scene  can  be  a  huge 
time-investment  and  typically  requires  artistic  ability 
(lighting,  cinematography,  mise-en-scene);  work  an 
engineer  typically  does  not  want  to  deal  with.  The 
implemented  software  infrastructure  offers  a  set  of  di¬ 
rectives  that  one  can  insert  into  existing  Renderman 
scenes  that,  when  encountered  at  render  time,  will  be 
overridden  with  the  emission  of  corresponding  proce¬ 
dural  calls  (such  as  piping  the  interpreted  simulation 
data).  Thus,  the  users  can  specify  a  scene  into  which 
they  want  to  ’’inject”  their  data,  either  drawing  from  a 
library  of  scenes  (made  available  on  Euler)  or  defining 
their  own.  Ultimately,  this  pipeline  abstracts  away  the 
need  to  deal  with  commercial  graphics  applications, 
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FIGURE  13:  Schematic  of  the  REYES  pipeline. 


FIGURE  14:  Schematic  of  the  customized  Renderman  pipeline. 


thus  making  it  possible  to  host  simulation  rendering 
as  a  remote,  “controllably-automatic”  service  for  any¬ 
one  without  a  background  in  graphics. 

5  NUMERICAL  EXPERIMENTS 

5.1  Light  Tracked  Vehicle  Mobility  Simulation 

This  simulation  captures  the  dynamics  of  a  com¬ 
plex  system  comprised  of  many  bilateral  and  unilat¬ 
eral  constraints.  Using  a  combination  of  joints  and  lin¬ 
ear  actuators,  a  tracked  vehicle  model  was  created  and 
then  simulated  navigating  over  either  flat  rigid  terrain 
or  deformable  terrain  made  up  of  gravel-type  granular 
material.  The  vehicle  is  modeled  to  represent  a  small, 
lightweight  tracked  vehicle  much  like  an  autonomous 
robot  that  could  be  sent  to  another  planet  or  used  to 
navigate  dangerous  terrain. 


There  are  two  tracks,  each  with  61  track  shoes  (see 
Fig.  1).  Each  track  shoe  is  made  up  of  two  cylinders 
and  three  rectangular  plates  and  has  a  mass  of  .34  kg. 
Each  shoe  is  connected  to  its  neighbors  using  one  pin 
joint  on  each  side,  allowing  the  tracks  to  rotate  rela¬ 
tive  to  each  other  only  along  one  axis.  Within  each 
track  there  are  five  rollers,  each  with  a  mass  of  15 
kg,  one  idler  and  one  sprocket  both  with  a  mass  of 
15  kg.  The  chassis  is  modeled  as  a  rectangular  box 
with  a  mass  of  200  kg  and  moments  of  inertia  were 
computed  for  all  parts  using  a  CAD  package.  The  pur¬ 
pose  of  the  rollers  is  to  keep  the  tracks  separated  and 
support  the  weight  of  the  vehicle  as  it  moves  forward. 
The  idler  is  necessary  as  it  keeps  the  track  tensioned. 
It  is  usually  modeled  with  a  linear  spring/actuator  but 
for  the  purposes  of  demonstration  it  was  fixed  using  a 
revolute  joint,  to  the  vehicle  chassis.  The  sprocket  is 
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used  to  drive  the  vehicle  and  is  attached  using  a  rev¬ 
olute  joint  to  the  chassis.  Torque  is  applied  to  drive 
the  track,  with  each  track  driven  independently  of  the 
other.  When  the  sprocket  rotates,  it  comes  into  con¬ 
tact  with  the  cylinders  on  the  track  shoe  and  turns  the 
track  with  a  gear  like  motion. 

The  track  for  the  vehicle  was  created  by  first  gen¬ 
erating  a  ring  of  connected  track  shoes.  This  ring 
was  dropped  onto  a  sprocket,  five  rollers,  and  an  idler 
which  was  connected  to  the  chassis  using  a  linear 
spring.  The  idler  was  pushed  with  2000  N  of  force 
until  the  track  was  tensioned  and  the  idler  had  stopped 
moving.  This  pre-tensioned  track  was  then  saved  to  a 
data  file  and  loaded  for  the  simulation  of  the  complete 
vehicle. 

5.2  Simulation  Results  for  Tracked  Vehicle 

In  this  simulation  scenario,  the  tracked  vehicle 
was  dropped  onto  a  flat  surface  and  a  torque  was  ap¬ 
plied  to  the  sprocket  to  drive  it  forwards;  the  forces 
on  several  revolute  joints  connecting  the  track  shoes 
were  analyzed  as  they  traveled  around  the  sprocket. 
Figure  15  shows  the  forces  in  one  revolute  joint  after 
the  track  has  dropped  onto  the  flat  surface.  Transient 
behavior  is  observed  when  the  torque  is  applied  to  the 
sprocket  at  1  second  and  the  track  shoe  connected  to 
this  joint  comes  into  contact  with  the  sprocket  at  5 
s.  The  oscillatory  behavior  of  the  joint  forces  can 
be  attributed  to  several  factors.  First,  the  tension  in 
the  track  was  very  high;  there  was  no  spring/linear 
actuator  attached  to  the  idler,  so  high  tension  forces 
could  not  be  dampened.  Secondly,  the  combination  of 
a  high  pre-tensioning  force  (2000  N)  and  lack  of  a  lin¬ 
ear  actuator  on  the  idler  resulted  in  high  revolute  joint 
forces. 

Figure  16  shows  the  joint  forces  for  several  revo¬ 
lute  joints  as  their  associated  track  shoes  go  around  the 
sprocket.  This  plot  shows  that  the  forces  in  the  joint 
are  highest  when  the  track  shoe  first  comes  into  con¬ 
tact  with  the  sprocket.  As  the  track  shoe  moves  around 
the  sprocket,  the  force  decreases  as  subsequent  track 
shoes  and  their  revolute  joints  help  distribute  the  load. 
It  should  be  noted  that  the  gearing  motion  between  the 


track  shoes  and  the  sprocket  was  not  ideal  as  it  was 
not  very  smooth.  In  a  more  realistic  model,  forces  be¬ 
tween  track  shoes  would  be  overlapping  so  that  the 
movement  of  the  tracks  is  more  smooth  and  the  forces 
experienced  by  the  revolute  joints  are  smaller. 

Figure  16  shows  the  joint  forces  for  one  revolute 
joint  where  the  tracked  vehicle  was  simulated  as  it 
moved  over  a  bed  of  84,000  granular  particles.  The 
particles  were  modeled  as  large  pieces  of  gravel  with 
a  radius  of  .075  m,  and  a  density  of  1900  kg/m3.  A 
torque  of  100  N-m  was  applied  to  both  sets  of  tracks  to 
move  the  vehicle.  Note  that  unlike  the  case  where  the 
vehicle  moves  on  a  flat  section  of  ground,  the  forces 
experienced  by  the  revolute  joints  are  much  noisier. 
Individual  grains  move  under  the  tracks  as  the  vehi¬ 
cle  moves  causing  large  vibrations  to  travel  through 
the  shoes.  These  vibrations  would  be  reduced  when 
modeling  a  more  complaint  terrain  material  that  can 
dissipate  energy  on  contact. 

5.3  Anchoring  in  Granular  Material 

The  purpose  of  this  effort  is  to  study  the  perfor¬ 
mance  of  different  anchor  designs  and  provide  a  rec¬ 
ommendation  on  which  is  better  suited  to  the  task  of 
anchoring.  The  anchors  will  be  tested  against  a  range 
of  parameters  relating  to  soil,  environment,  and  an¬ 
chor  penetration  angles/velocities  to  better  understand 
their  corresponding  performance  characteristics. 

The  anchor  was  modeled  using  three  types  of  reg¬ 
ular  primitives.  The  tip  of  the  anchor  was  modeled 
using  a  sphere,  the  shaft  was  modeled  using  a  cylin¬ 
der,  and  the  helix  was  modeled  using  67  thin  boxes 
swept  along  a  helical  spline.  This  configuration  is  op¬ 
timal  compared  to  using  a  triangulated  mesh  for  the 
anchor.  The  triangular  mesh  requires  several  thousand 
triangles,  decreasing  the  performance  of  the  collision 
detection  and  increasing  the  memory  requirements  for 
the  simulation.  Using  primitives  has  an  added  ben¬ 
efit;  modifying  the  geometry  of  the  anchor  becomes 
straightforward,  thus  the  pitch  of  the  anchor  along 
with  its  diameter  and  thickness  can  be  varied  easily 
allowing  parametric  studies  to  be  completed. 

The  anchor  had  several  constraints  and  forces 
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Force  vs  Time 


FIGURE  15:  Magnitude  of  force  experienced  by  one  revolute  joint. 


Force  vs  Time 


FIGURE  16:  Magnitude  of  force  experienced  by  5  revolute  joints. 


which  were  used  to  control  its  motion.  First,  there 
was  a  bilateral  constraint  restricting  the  planar  motion 
of  the  anchor,  forcing  it  to  move  straight  up  or  down. 
This  constraint  also  prevented  the  anchor  from  rotat¬ 
ing  in  all  axes  except  the  vertical  axis.  A  pressing 
force  was  used  to  press  the  anchor  into  the  material. 
This  simulated  an  actuator  which  may  be  attached  to 
the  other  end  of  the  anchor,  forcing  it  to  penetrate.  A 


torque  was  applied  to  the  anchor  to  cause  it  to  rotate 
and  screw  into  the  material,  and  a  vertical  force  was 
used  to  pull  the  anchor  out  of  the  material  at  the  end 
of  the  simulation. 

The  numerical  experiment  consisted  of  a  granular 
bed  made  up  of  spheres  of  randomly  varying  radii  that 
was  pre-settled  and  loaded  at  the  start  of  each  simula¬ 
tion.  For  each  simulation  only  the  parameters  associ- 
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FIGURE  17:  Magnitude  of  force  experienced  by  one  revolute  joint  on  granular  terrain. 


ated  with  the  anchor  were  varied.  The  anchor’s  mass 
was  10  kg  with  a  radius  of  0.5  m.  The  granular  ma¬ 
terial  had  a  mass  of  0.005  kg  and  a  radius  randomly 
varying  between  0.025  m  and  0.036  m.  The  granular 
material  had  a  friction  coefficient  of  0.4,  and  gravity 
was  set  to  -9.806  m/s2.  The  time  step  was  5xl0-4  s, 
with  1000  CCP  iterations  performed  per  time  step. 

5.4  Simulation  Results  for  Anchoring  System 

Several  sets  of  parametric  tests  were  simulated  us¬ 
ing  the  anchor  model:  the  torque  applied  to  the  anchor 
was  varied,  and  the  pullout  force  applied  after  anchor¬ 
ing  was  varied.  Fig.  18  shows  an  anchor  with  the  same 
mass  that  has  4  different  torques  applied  to  it.  For 
each  test  at  the  end  the  pullout  force  remained  con¬ 
stant  at  300  N.  The  plot  shows  that  at  2  seconds,  when 
the  anchoring  torque  was  applied,  the  anchor  with  the 
highest  torque  went  in  the  deepest,  fastest,  which  is 
as  expected.  With  the  constant  pulling  force  that  was 
applied  at  7  seconds,  only  the  anchor  with  the  low¬ 
est  anchoring  torque,  600  N-m,  was  pulled  upwards, 
the  mass  of  the  granular  material  above  the  three  other 
anchors  was  too  high  for  the  force  to  have  any  effect. 

Fig.  19,  with  a  close  up  in  Fig.  20,  shows  a  dif¬ 
ferent  set  of  simulations.  Here  the  anchoring  torque 


was  kept  constant,  but  the  pullout  force  was  changed. 
The  purpose  of  the  test  was  to  gauge  the  magnitude  of 
the  pullout  force  required  for  a  given  applied  torque. 
The  plot  shows  that  only  a  force  of  2000  N  was  able  to 
pullout  easily,  gaining  velocity  as  it  moved  upwards. 
The  pullout  force  of  1600  N  was  able  to  start  pulling 
out  slowly  and  at  a  constant  velocity. 

6  CONCLUSIONS  AND  FUTURE  WORK 

This  work  describes  developments  that  expand 
parallel  simulation  capabilities  in  multibody  dynam¬ 
ics.  The  many -body  dynamics  problem  of  interest 
has  been  modeled  as  a  cone  complementarity  problem 
whose  parallel  numerical  solution  scales  linearly  with 
the  number  of  bodies  in  the  system.  These  develop¬ 
ments  have  directly  resulted  in  the  ability  to  simulate 
complex  tracked  vehicles  operating  on  granular  ter¬ 
rain.  The  parallel  simulation  capability  was  demon¬ 
strated  in  the  context  of  an  application  that  empha¬ 
sizes  the  interplay  between  light-vehicle/track/terrain 
dynamics,  where  the  vehicle  feature  length  becomes 
comparable  with  the  dimensions  associated  with  the 
obstacles  expected  to  be  negotiated  by  the  vehicle. 
The  simulation  capability  is  anticipated  to  be  useful 
in  gauging  vehicle  mobility  early  in  the  design  phase, 
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Anchor  Depth  vs  Time 


FIGURE  18:  Anchor  with  different  applied  torques  and  a  constant  pullout  force  of  300  N. 


as  well  as  in  testing  navigation/control  strategies  de- 
fined/leamed  on  the  fly  by  small  autonomous  vehicles 
as  they  navigate  uncharted  terrain  profiles. 

In  terms  of  future  work,  the  convergence  issue  in¬ 
duced  by  the  multiscale  attribute  of  the  vehicle-terrain 
interaction  problem  remains  to  be  addressed.  Addi¬ 
tionally,  technical  effort  will  focus  on  extending  the 
entire  algorithm  to  run  on  a  cluster  of  GPU-enabled 
machines,  further  increasing  the  size  of  tractable  prob¬ 
lems.  The  modeling  approach  remains  to  be  aug¬ 
mented  with  a  dual  discrete/continuum  representation 
of  the  terrain  to  accommodate  large  scale  simulations 
for  which  an  exclusively  discrete  terrain  model  would 
unnecessarily  burden  the  numerical  solution. 
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FIGURE  19:  Anchor  with  different  pullout  forces  and  a  constant  torque  of  1000  N-M. 
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FIGURE  20:  Closeup  of  Fig.  19,  Anchor  with  different  pullout  forces  and  a  constant  torque  of  1000  N-M. 
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